1. Abstract

Note: all files from this project are available on GitHub.

Herein, we investigate the relationship between public opinion on immigration and U.S. immigration policy from 2004 to 2024. We establish Google trends search interest as a proxy measure of public opinion. We decompose the primary research question into 3 sub-questions measuring immigration policy by election outcomes, deportations, and encounters at the border. We establish a correlation between Google trends search interest for immigration and electoral swings over our selected time period. This correlation has grown more significant over time. The relationship between search interest and policy remains nuanced and relatively elusive.

2. Introduction

Immigration has become an increasingly divisive issue in the United States. For the 2024 election cycle, immigration emerged as a driving factor resulting in the loss of the Democratic party.

While popular discourse has made clear that the American voting public is skeptical of the government’s approach towards immigration policy, it is unclear whether this skepticisim is empirically well-founded. Furthermore, the direct impact of voting on immigration policy is unclear. We establish three research questions in this paper to analyze how public opinion towards immigration has changed over time and shaped policy in the US:

  1. How is the change in public sentiment towards immigration (measured via Google trends interest) correlated with Presidential election results?
  2. How are presidential election results correlated with immigration policy (measured via deportations and encounters at the border)?
  3. How has public sentiment towards immigration in the US affected immigration policy over time (measured via deportations and encounters at the border)?

In the modern internet-driven media ecosystem, Google trends may serve as a valuable measure of public sentiment towards immigration in the US. Though not an actual measure of opinion, these data could provide important geographical context not otherwise available with existing measures (e.g., public opinion polls).

By answering these questions, we hope to provide insight to members of the media, voters, and policy-makers into the interplay between public opinion on immigration and policy.

3. Data Sources

We make use of three datasets in this paper.

3.2 Harvard county-level election data

This dataset uses a CC0 1.0 Universal license. This license is very permissible. The dataset contains county-level election results by year, office, and party in the United States.

3.3 Border encounters and deportation data

This dataset also uses a CC0 1.0 Universal license. This license is very permissible. This dataset contains individual-level arrests and deportations by ICE and Border Patrol in the United States.

All data were collected by Dowland Aiello.

Note (Winston Qi): Manually loaded deportations and apprehensions datasets from the website of the original source, converting the xlsx files to csv files online. This was due to me having some technical issues with loading my groupmate’s data on my end and being unable to resolve it before having to finish my work on my research question. Data still follows the datasets mentioned above.

4 Data

4.2 County-level Election Data

In its raw form, this dataset contains 72,618 rows and 13 columns. Each row represents the number of votes for a candidate in a given county running under a specified party for a specified office in a specified year. Relevant columns include:

  • year - represented by a 4-digit integer
  • state - fully expanded string of the form "ALABAMA"
  • county_name - shortened uppercase county-name in the form "AUTAUGA"
  • office - string of the form "US PRESIDENT"
  • party - string of the form "REPUBLICAN"
  • candidatevotes - integer
  • totalvotes - integer

NA values are encoded explicitly as the value NA.

4.3 Border encounters and deportation data

This dataset is extremely large and requires significant cleaning. Even after cleaning, a table with multiple hundreds of thousands of rows is produced. To enable file-sharing on GitHub, we separate this table into multiple compressed .csv.bz files with a maximum row count of 100,000 each. Each table contains 30 columns. Relevant columns include:

  • Port of Departure - string indicating which city, state an individual was deported in. This will be relevant to our analysis and worth merging with county names in the electoral and trends datasets.

The border encounters dataset was similarly extraordinarily large. We break the table up into multiple tables with a maximum of 10,000 rows. Each table contains 9 columns. Relevant columns include:

  • FY - year of observation, formatted like such: "FY%YYYY". This will be useful to calculate the change in border apprehensions over time.

NA values are encoded explicitly as expected in both datasets.

5. Method

5.1 Data Cleaning

See the scripts/ folder for utilities we wrote to clean our data. All data cleaning was performed by Dowland Aiello.

5.1.2 Harvard Election Data

Harvard election data was filtered to only include rows where office == "US PRESIDENT". No NA values were present in the relevant vote count and office columns. The data were fairly clean.

5.1.3 ICE / CBP Data

Apprehensions data were originally provided in .xlsx excel format. Furthermore, each table was multiple hundreds of thousands of rows long. We break each table into multiple compressed .csv.bz files in order to upload each table to GitHub. We elaborate more on data cleaning for this table in the relevant sections. These data span multiple hundreds of thousands of lines.

5.2 Analysis

To answer our three research questions, we created line plot and geographical visualizations of Google search interest, election results, and arrests / deportations by CBP and ICE.

5.2.1 Search Interest and Electoral Analysis

In our analysis, we aim to determine whether there is any correlation between search interest for “immigration” and the electoral swing. We restricted our analysis to the 2004 - 2012, 2008 - 2016, and 2016 - 2020 election cycles. To generate these visualizations, we calculate:

  • The electoral swing for the selected timeframes
  • The change in search query incidence for the selected timeframe.

5.2.1.1 Relevant Variables

We make use of most variables in the electoral dataset. We discard redundant and miscellaneous columns:

  • "state_po" and "county_fips" - alternate names / geographical identifiers
  • "version" and "mode" - irrelevant metadata

In order to calculate a per-county swing, we pivot the table to wide format.

5.2.1.2 Swing Calculation

Taking advantage of the pivoted data, we can calculate a row-wise “electoral swing” by subtracting the relevant vote columns. For example, to calculate the swing from 2004 - 2008 for all counties, we can subtract REPUBLICAN_2004 column from the REPUBLICAN_2008.

In order to plot electoral results against search query interest, we calculate a “swing” similarly for the Google trends dataset using the DMA and query_incidence columns. We make use of a similar pivoting technique.

5.2.1.3 Geocoding

Notably, the Google trends dataset formats county names slightly differently from the electoral dataset. To account for this, we greedily match rows through fuzzy joining.

However, in order to generate geographical plots, we must also generate coordinates for each county. We do so using the Google geocoding API.

5.2.2 ICE / CBP Analysis

<TODO: Roy stuff> ### 5.2.3

5.2.1.3 Data Cleaning for Border Encounters Data

This is the data cleaning step where I organized the border encounter’s data set. I grouped the data set by Fiscal.Year and Encounter.Type to compute the total number of encounters for each type per year by summing encounter.count while removing any missing values. After that, I reshaped the data into a wide form so Encounter.Type became its own column with corresponding total count.

5.2.1.4 Merging Data

To analyze the relationship between public sentiment and immigration enforcement, I merged the two data set into one, where I merge Google Trends data and border encounter data using a left_join() on the shared variable Fiscal.Year as the google trends data set has been wrangled from date range to Fiscal Year. The merged summary data allowed me to combine the average search interest from Google Trends with the total number of apprehensions, expulsions, and inadmissible for each corresponding year.

5.2.3 ICE, CBP, and Election Results Analysis

5.2.3.1 Dataset Loading for Manipulation

5.2.3.1.1 Election results 2000-2020 Dataset

5.2.3.1.2 Annual apprehensions with place of origin (2000-2022)

Made a for loop to extract the individual datasets from the aggregated data list, as the file names, being “Family Units apprehended along the SWB FY20XX Redacted_raw.csv”, (20XX for their respective years) were fairly uniform. The datasets range from the years 2000-2022. Also made a dataframe for the original dimensions of each dataset, as can be seen below. The names of the variables directly listed in the dataset are “U.S. Border Patrol Nationwide Apprehensions”, then to “X”, “X.1” etc. up to “X.6” for the datasets between 2000-2006. The datasets between 2007-2015 have all the previous variables but add on a new variable, being “X.7”; datasets between 2016-2022 follow the same format as the 2007-2015 datasets along with variables “X.8” and “X.9”. When manually looking at the datasets, these variables are all placeholder variable names, with the actual names written on line 6. There are numerous variables but I decided to focus only on the fiscal year or “FY” variable, for the for similar reasons to the deportations datasets above of not being relevant enough to answering my research question. Each row represents an individual apprehended by the USBP and general relevant personal and geographical information gathered pertaining to them in a given year. Dimensions for each dataset are shown below.

5.2.3.1.3 Removals (Deportations) Datasets (2011-2023)

There are some breaks in the years (e.g missing 2017 and 2020-2021) due to no datasets for those years being present on the sourced website, but general trends in deportations should still be seen with the years that are available. Dataset variables are not entirely consistent, but opted to only use the Departure Date variable in each dataset as the only relevant variable for consistency and the utility of dates for which immigrants were deported. Originally planned for usage and analysis of variables like Birth Country, Citizenship Country, but decided against it as they didn’t help answer my research question of how election results affected immigration policy as much as I initially thought, instead opting to focus on the number of deportations that happen in a given year to track the impact of immigration policies. Each row represents an individual that was deported by ICE within a certain period of time, with individual tracking, personal, and geographical information serving as identifiers for the individuals’ cases. Dimensions for each dataset are shown below.

5.2.3.2 Date Separation Function

Created a generic function to separate the year column of a dataframe into a date format when needed, later discarded other newly created date columns for just year, as I wanted to capture larger trends for the bigger picture regarding periods between elections rather than a day by day/month by month basis.

5.2.3.2 Election Results Filtering

Checked to make sure that there wasn’t NA values or other substutionary values in the total votes, then summarized the Democrat and Republican total votes by year and put them into one dataframe. Only concerned with Republicans and Democrats, as they are the 2 main parties that win US elections and the portion of votes from other parties are not significant enough to affect any election results. Pivoted to wide to make the voting totals per year by party easier to interpret visually, then included a voting difference variable. The aggregate total votes for each party by year helps quickly see and compare the total amount of votes that each party got, while the difference helps see the general party voting trends in the United States and which party is winning the popular vote.

5.2.3.2 Border Status Differentiation

Used cleaning methods of above section except creating new categorical variables/columns of border status, border state, etc. to determine a state’ status of being on the border of the United States or not and differentiating the types of votes through border and political party categories. These distinctions will help in answering one of the sub questions of whether people in border states will be more likely to vote Republican for their immigration policies due to immigration being a more tangible and closer issue to them - through comparing the voting trends of border and non-border states.

5.2.3.3 Deportations Filtering and Merging

Selected only the departure dates of for consistency and the dates for which immigrants were deported. Discarded other variables like Birth Country or Citizenship Country of immigrants. Such variables were originally planned for usage and analysis but later on I decided that they didn’t help answer my research question of how election results affected immigration policy as much as I initially thought, instead opting to focus on the total number of deportations that happen in a given year to track the impact of immigration policies. Used the date_sep function to filter out the dates for each of the deportations in the datasets, then binded them together into a total deportations dataset, using sample_n and head to check that it is good. Discarded the complete date, month, and day portions created by the date_sep function to keep the year, and counted the # of rows each year had. There were some breaks in the years (e.g missing 2017 and 2020-2021) due to no datasets for those years being present in the source that pertained to the same category, but decided to continue general trends in deportations can still be seen with the years that are available.

5.2.3.4 Apprehensions Filtering and Merging

Similarly to the above section, only selected/filtered for the year and discarded other variables/values for general yearly apprehension trends that correspond to immigration policies’ impacts/enforcements. The values with year in the variable “U.S. Border Patrol Nationwide Apprehensions” did not include month, day, or time, so filtered for the years through checking for the string “FY2”, combining all the cleaned datasets, then removing the “FY” suffix to leave just the year and counting up the number of apprehensions per year.

5.2.3.5 Aggregate Dataset Merging into Long Format

This code chunk is for the merging, arranging, and pivoting to long form the core and cleaned/reworked datasets for intended visual graphs and their subsequent analyses.

5.2.4 Analysis

For my analysis, I decided to first create some graphs/plots to visually analyze the data gathered before deciding on any further potential or more complex analysis methods (e.g. slopes, linear regression, etc.) were needed. If the graphs clearly showed certain trends like a positive or negative slope, or didn’t correlate, then such methods wouldn’t be needed for further confirmation.

I mainly stuck to line graphs due to the heavily numerical aspects of my data, with the exception being the bar plot regarding the election results.

5.2.4.1 Aggregate Line Graph of all Datasets

I first attempted to compare the number of deportations and apprehensions to votes by year separately, and so created two line graphs to see the general voting trends against the immigration related data. At a glance, it seems fairly hard to distinguish any trends given the sizable numerical gap between the election results and deportations/apprehensions, with the later two seemingly having close to no fluctuations or changes in their number on the scale currently alloted in the graph by default. In fact, it seems like both look like they are extremely close to 0, which is not true at all considering the values gleaned from above. Additionally, there are several points where data cuts off, like how the election results cuts off at 2020, or how deportations has much less datapoints to go off of compared to the other data, not to mention how it has some missing years. I decided to skip over the missing years to be able to see general trends in the data.

As such, these observations led me to break down the graphs into more smaller, more spaced out comparisons of datasets, usually only taking a a couple at most in my results down below.

6. Results

6.1 Search Query Interest vs Electoral Swing

6.1.1 Geographical Visualization

Using our manipulated dataset from section 5.2, we generate maps plotting the swing in search interest vs the electoral swing per county in the United States. We do so for the 2004 - 2012, 2008 - 2016, and 2016 - 2020 election cycles. Note that we use a cached copy of our geocoded dataset for convenience and reproducibility purposes.

6.1.1.1 Executive Visualization Decisions

In our geographical visualization, we opted to represent electoral swing with a diverging color scale. We represent a “right” swing with a red hue, while we represent a “left” swing with a blue hue. This choice is relatively standard among electoral maps. We use grey to represent no change. A diverging color scale is a natural choice to represent our results, as it gives zero-values a meaningful interpretation.

6.1.1.2 Visualizations

Notable in the above maps are:

  • Swing states
  • Polarization of elections over time

However, it is relatively difficult to interpret the relationship between search query incidence and the electoral swing.

6.1.1.2 Alternative Visualization: Scatter Plot w/ Linear Regression

In order to compare the relationship between query incidence and election results more explicitly, we generate a scatter plot with a linear regression of counties’ electoral swing and search query incidence in the same election cycles.

Note that the linear regression slope is positive in both the 2008 - 2020 and 2016 - 2020 election cycles. The slope appears slightly steeper in the 2016 - 2020 election cycle.

6.2 ICE / CBP Analysis

6.2.1 Plot 1: Public Sentiment Over Time

This diagram shows the average Google Search interest in immigration across different years. From the data, we can observe that public attention peaked around 2012 and 2020, which aligns with the major election cycle of Obama and Trump. The trends have highlighted how immigration has become an important topic during the politically charged period but lost attention afterward. In addition, after 2020, we can observe a decline in the public’s interest in the topic of immigration which can likely be cause becuase of the increase of public attention on Covid 19.

6.2.2 Plot 2: Immigration Enforcement Over Time

This line graph shows us the three different types of U.S Immigration enforcement action which include: Apprehensions, expulsions and inadmissible over the four fiscal years from 2020-2023. From the data, we can observe a rose in apprehension and has surpassed the rate of expulsion by 2022, which suggest a shift in enforcement strategy by the government. In additionaly, Inadmissble entris has also increased during this period, which indiccate a growing challenges at the broader for legal immigrant to enter.

6.2.3

This scatter plot shows a clear negative correlation between average Google search interest in immigration and the number of apprehensions at the border. As public attention increases, enforcement activity appears to decline. This trend may suggest a lagged response in policy implementation or indicate that heightened public discourse influences shifts in enforcement priorities over time.

6.3 ICE, CBP, Election Results, National Border Voting, and Voting Difference Analysis

I explored correlations visually (via line and scatter plots) and descriptively, and made adjustments for improved visual comparisons when appropriate (e.g. number scalings, filtering certain years, etc.)

6.3.2 Total Vote Differences (Republican - Democrat)

The line graph shows that there was an initial trend in Republican Voting like the previous barplot suggested, but that the trend flipped on its head to Democrat lead in 2008. The popular vote has maintained democrat for all but 2004, but was still overcome in 2000, 2004, and 2016, partially indicated as peaks in Republican voting during those years.

6.3.3 Apprehensions vs Deportations

While I did start with a plot including the voting difference with apprehensions and deportations, the data for the vote difference was not as applicable or generalizable as I thought to the data regarding immigration enforcement. The lengths between each election year was fairly wide (being 4 years), making it hard to try and find direct correlation between the more frequently changing values of the other two. Additionally, the numerical difference was still very wide despite the reduction in scale compared to my inital graphs of all the datasets as one graph, which restricts the degree of comparisons/analysis visually. Even when considering apprehensions, which had more years of data than deportations, it was hard to see any changes from election results reflected into apprehensions.

So I decided to focus on just comparing apprehensions and deportations.

The scaling is better, but deportations only filled less than half the years that apprehensions had data for, so I decided to filter for the years that had data for both apprehensions and deportations to be able to analyze the data better.

Created a new dataframe within those requirements to get this graph:

Asides from the sharp drop in both after 2019 and affecting a couple years beyond it, most likely due to the COVID 19 Pandemic, there did not seem to be any significant correlation/trend between apprehensions and deportations. The spike in both around 2019 may be attributed to the ICE Raids and the raids carried out under the Trump Administration at the time. Deportations seem to generally trend downwards from 2012-2023, and the same can be said for apprehensions in regards to its longer timeframe (2000-2022). However, in the shorter timeframe (2010-2022), apprehensions seemed to be going up.

6.3.4 Border and Non-Border Votes

Here I simply compared the total number of votes for each party that border and non-border states had, in order to see whether or not a state being on the US national border (including both Mexico and Canada) had an impact on whether they voted more Republican. This was under the initial assumption of immigration potentially being a more impactful/influential topic in regards to a state’s geographical proximity to the border and potentially harsher immigration policies supported/enacted as a result.

Plotting the voting difference (Republican - Democrat) by border status:

From the graphs, border states do not seem to be more likely to vote Republican regarding their assumed increased involvement with immigration based on geographical location; follow similar voting trends in regards to non-border states. In fact, non-border states seem to vote more Republican at times.

6.3.5 Analysis Summary

In summary, the key findings were:

  • Election Trends: Democrats won the popular vote in most years between 2000–2020, but not always the presidency (e.g., 2000, 2004, and 2016).

  • Deportation Trends: Deportations generally decreased from 2012–2023, with a sharp drop post-2019.

  • Apprehensions: Sharp rise in 2019, coinciding with Trump administration raids, and a similar sharp drop to deportations during COVID.

  • Border vs. Non-Border States: Voting patterns do not significantly differ between these groups and instead have very similar voting trends, contradicting assumption that border states would lean more Republican.

In all, our findings illustrate that the initial assumption of election results and border state voting patterns correlating with or predicting immigrant enforcement activity/policy was not true, with the winning party not necessarily following the presumed enforcement patterns, and border states not voting particularly different compared to non-border states.

7. Discussion

7.1 Influence of Public Opinion on Electoral Outcomes

Our analysis reveals that there is an increasingly strong correlation between search interest for immigration and the electoral swing in elections from 2004 - 2020 in the US. We confirm this finding using a map visualization and linear regression analysis. Our linear regression analysis indicates statistically significant correlation between the explanatory and dependent variables.

7.2 Influence of Public Opinion on Immigration Law Enforcement Activity

From my research, I had notice a negative relationship between the search interest in immigration and immigration apprehension which is what I didn’t expect at the start. Specifically, from the period of 2020-2023, an decrease in the search interest has cause the apprehension encounters to increase over the 4 years which was what our team wasn’t expecting. This implies that rising attention or conern is not promptly associated with an increase in enforcement. However, this negative relationship may be cause by a possibility of policy lag or time lags when government is enforcing a change in their policy. Government are likely not going to immediately react or make a change in policy when they notice an increase in public sentiment with Immigration topic. It will take times for the government to make a change in policy, or actually do something to the public about it.

7.3 Influence of Election Results on Immigration Law Enforcement Activity

Our analysis reveals that presidential election results do not appear to directly correlate or predict immigration enforcement activity as initially thought. While Republican administrations are generally assumed to favor stricter immigration policy, actual deportation and apprehension figures vary more erratically from year to year and seem to be influenced by more than just electoral results, such as events like COVID-19. The spike in apprehensions and deportations alike in 2019 is a clear outlier likely driven by targeted ICE activity and public raids, but generally both seem to be declining. The COVID-19 pandemic appears to have migration and enforcement alike, given the decrease in both apprehensions and deportations. Additionally, whether a state is on the US’s national borders does not seem to show an influence in Republican voting as originally hypothesized, and have similar voting trends to states that are not on the border.

7.4 Limitations

7.4.1 Research Question 1

Of note, our chosen metric of public opinion for immigration measures only magnitude, not directionality (like/dislike). Furthermore, we have no mechanism for establishing a causal relationship between search interest and election results due to confounding variables (most voters are presumably not single-issue voters).

7.4.1 Research Question 2

In regards to election results and immigration enforcement, the years did not align perfectly across datasets, and there were some missing years, which may have been able to give more applicability and further insight/analysis into the yearly trends if not absent. Relevant events (e.g., Title 42 expulsions from COVID 19) are not explicitly captured or mentioned within the data, making it difficult to factor in the potential outside interference of them and limiting the analysis that can be made without such contexts, and our research does not aim to extrapolate concrete causality within the data, merely observe and highlight trends.

7.4.1 Research Question 3

The Google Trends dataset only measures attention, not opinion and a rise in search volume may indicate growing public interest in immigration, but doesn’t tell us whether that interest is positive or negative. Additionally, the data captures search behavior, but not direct public action such as voting, protesting, or advocating for immigration reform, which may provide more insight into the extent of public sentiment than the current data. The regional Google Trends data is based on relative scores (0–100), not absolute search volume, which makes it difficult to compare true magnitude across locations. The possibility of Time Lag of data input from the goverment’s enforcement may cause the result to become different from our expectations. For example, time lag may have switched a positive relationship between public sentiment and apprehension encounters into a negative relationship.

7.5 Future Work

7.5.1 Research Question 1

It may be helpful to establish a wider dataset incorporating search terms which reveal an individual’s attitude towards immigrants. For example, search terms like, “illegal alien,” “gang member,” “MS-13,” and others may provide helpful context. It may also be worth comparing search interest for immigration with other topics relevant to elections in the US (e.g., “deficit spending,” “China,” etc.). This expanded dataset could aid in establishing a direct causal link between search interest and electoral outcomes.

7.5.2 Research Question 2

In regards to election results and immigration enforcement, perhaps zoning in on more specific states and their voting patterns as well as comparing voting patterns between states and/or counties specifically bordering on Mexico, Canada, or not would be more insightful in observing potential trends/patterns regarding immigration there. Other, more focused datasets for deportation or policy could be used, and comparisons/analysis regarding the effects of specific immigration/border policies could be addressed.

7.5.3 Research Question 3

Future research into the relation between public sentiment and apprehensions may incorporate analysis from social media or news articles to capture actual actions, not just attention. In addition, an expansion of the enforcement dataset to include more years would improve both the longitudinal analysis and utility of the research for the media, voters, and policy-makers.

8. Summary

We establish a correlation between search interest for immigration and electoral swings in the US that has become stronger over time. Google Trends and Apprehensions highlighted a potential relationship between public sentiment and immigration enforcement in the US; a increase in public sentiment correlates with further government action to improve and enforce immigration policy that is more fair with those involved. While the relationship between public sentiment and government’s enforcement may not be specific (given the negative relationship between both variables), we can still interpret that the public’s interest still plays a huge role in government’s policy change. Meanwhile, though electoral results offer a degree of political context, they are not good predictors of immigration enforcement trends; . States that lie on the borders of the United States also do not necessarily vote more Republican votes in favor of harsher immigration policies, and instead mirror the same general voting trends of non-border states and the nation-wide popular vote results as a whole. Ultimately, the relationship between public opinion and immigration enforcement in the US remains elusive and nuanced due to policy lag and numerous confounding outside variables and events, whose influences may be hard to discern. But while not all research questions met their initial assumption, correlations between public sentiment and both the electoral swings and immigration enforcement were found. Google Trends Search Interest for Immigration and Electoral Swing results indicate an increasingly strong correlation between search interest for immigration and the electoral swing in elections from 2004 - 2020 in the US, while Google Trends Search Interest for Immigration and Apprehensions indicate that increases in public interest and sentiment regarding immigration correspond with lower immigration enforcement activity.